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© Internal computer performance monitoring by event sampling. 



© The disclosure provides event-controlled oper- 
ations for an internal hardware/software monitor for a 
processor in a data processing system. It embeds 
and distributes in each processor at least one in- 
strumentation table unit (ITU) and event detection 
circuitry to detect events and conditions for collect- 
ing event-sampled hardware signals provided In the 
processor hardware in which the respective ITU is 
embedded. Instrumentation measurement is con- 

9 trolled centrally In the system. Sampling of the CPU 
signals for recording in the ITU is done at (or a sub- 
^multiple of) the occurrence rate of the selected 
,j-event(s) in the processor. The sampled signals are 
recorded in the ITU. The ITUs of plural processors 
l^are asynchronously operated in a system. The 
I/) event-driven monitoring circuitry may be solely pro- 
Nvided in an ITU, or it may be superimposed on a 
£2 timer-driven internal instrumentation system of the 
type described in U.S.A. patent 4,590,550 in which 
g-the ITU is shared between event and timer driven 
modes of operation. Branch-taken event monitoring 
is also included in the disclosure. 




Xerox Copy Centre 



.0257241 A2J_> 



1 



1 0 257 241 2 

INTERNAL PERFORMANCE MONITORING BY EVENT SAMPLING 



Introduction 

The subject invention provides an event-driven 
method and means for sampling hardware gen- 
erated signals within a data processing system 
based upon the occurrence of selected processing 
events and conditions. The event signals are sam- 
pled and recorded by means which are distribu- 
tive^ interwoven within the CPU structure. 



Background 

Many computer performance monitoring tools 
have been developed for evaluating the perfor- 
mance of computer systems. They have been con- 
ceived with various goals. Some are software, 
some hardware. Most hardware monitors have 
been separate from the system they measure, con- 
nected to it by manually inserted probes or by a 
plug interface. 

The traditional distinction among monitor types 
is between counters and recorders. The counter 
type counts the number of occurrences of each of 
a set of events, with the counted output normally 
representing some kind of meaningful information. 
The recorder type collects data about defined 
events on recording media. Later analysis of both 
types is usually needed to make the collected data 
intelligible. IBM monitors in the early 1960's mea- 
sured specified states in an IBM 7090 data pro- 
cessing system, such as: total CPU operation time; 
channel A operation time; channel B operation 
time; CPU busy with no I/O in process; tape equip- 
ment operation time, CPU in wait state; and card 
equipment operation time. 

Software monitors are widely used today but 
are limited to sampling data stored in memory. 
They cannot detect hardware states per se. Also, 
software monitors universally have the drawback of 
distorting the performance of the system they are 
measuring, because software monitors compete 
with the program being measured for use of the 
resources in the system. 

Monitoring functions may also be separated 
into two other subtypes: those that sense hook 
instructions put into a program to assist a monitor- 
ing operation, and those that sense some char- 
acteristic stored by an unmodified running pro- 
gram. For example, a hook may be put into a 
program routine so that .the number of times the 
hook instruction executed would indicate the num- 
ber of times the routine was entered, or the num- 
ber of times routine looped, depending on where 



the hook was inserted. Both software and hardware 
monitors have been used to sense and count the 
occurrences of a hook instruction. Also, monitor 
functions that have been used to sense non-hook 

5 program characteristics, for example, have counted 
the occurrence of specified operation codes, or 
plotted the address distribution of accesses to main 
storage. Non-operable instructions have been in- 
serted as hooks to cause a program interrupt that 

70 Initiated the recording of an identifying characteris- 
tic of the hook instruction. Further, the Monitor Call 
(MC) instruction in the IBM System/370 architec- 
ture was provided for use as a hook instruction 
insertable into program code. Monitors and their 

76 use have been described In publications, such as a 
book entitled "Evaluation and Measurement Tech- 
niques for Digital Computer Systems'*, by M. E. 
Drummond, Jr., published in 1973 by Prentice-Hall 
Inc., Englewood Cliffs, New Jersey. 

20 Hardware monitors have been commercially 
sold, such as the Comten 8028 monitor and the 
Tesdata monitors. Software monitors have been in 
public use for many years such as the IBM 
"Resource Measurement Facility" (RMF) and the 

25 Candle "Omegamon" program. 

Examples of early patents on data processing 
system hardware monitors externally connectable 
to a system are represented by U.SA. patent 
3.399,298 to H. M. Taylor entitled "Data Processing 

30 Profitability Monitoring Apparatus"; U.SA patent 
3,588,837 to FL D. Rash et ai entitled "System 
Activity Monitor"; and U.SA patent 4,068,304 to 
W. F. Beausoleil et al entitled' "Storage Hierarchy 
Performance Monitor" (assigned to the same as- 

35 signee as this application). 

Another externally connected monitor is dis- 
closed and claimed in U.SA patent 4,435,759 to 
B. I. Baum et al (assigned to the same assignee as 
this application). It provides a hardware monitor 

40 with a software correlation characteristic. It is exter- 
nally connected to a uniprocessor or multiproces- 
sor system to collect selected hardware events in 
that system. It relates the collected hardware 
events to causative software by simultaneously 

45 capturing and recording the address of a potentially 
causative software instruction at the time the hard- 
ware event is being sampled for collection. Collect- 
ing is done on every Nth occurrence of a predeter- 
mined hardware event, to capture the causative 

so instruction address and one or more other hard- 
ware states that can be correlated with the cap- 
tured Instruction address. Hence, the captured in- 
struction addresses relate the simultaneous col- 
lected events to the software that potentially caus- 
ed them. U.SA patent 4,435,759 also discloses a 
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set of monitors, externally connected to the CPU's 
in a multiprocessor, with interconnections between 
the monitors, ail monitors also being connected to 
an external control processor. The control proces- 
sor issues read commands to the plural external 
monitors to synchronize their capture and output- 
ting of events in the different CPU's being mon- 
itored. The control processor in this way groups the 
captured events, by receiving and recording each 
group as the set of events resulting from each read 
command. 

U.S A patent 4,590,550 (assigned to the same 
assignee as this application) to J. H. Eilert et al 
entitled "Internally Distributed Monitoring System" 
discloses a timer-driven performance monitor, built 
into and distributed within the system it measures, 
thereby eliminating some monitoring problems 
caused by the external location of prior monitors. 



Summary Of The Invention 

The present invention differs from patent 
4,590,550 in providing an event-driven monitor 
which, as discussed below, is internal to a proces- 
sor (which may be in a UP or MP) to allow a 
concentrated mode of data capture especially suit- 
able for measuring the performance of very fast 
LSI hardware in relation to its driving software, 
Including during the developmental debug phases 
of the hardware or software. Nevertheless, the sub- 
ject invention is built on the base environment 
defined in 4,590,550. The objectives of this inven- 
tion are to; 

1. Provide self-contained monitoring for a 
system in which LSI technology may prevent the 
attachment of an external hardware monitor. 

2. Embed instrumentation table units (ITU's) 
within each CPU in the system, to monitor an 
identifiable set of signals in the CPU. 

3. Enlarge the set of data representing sig- 
nals subject to collection, beyond previous moni- 
tors. 

4. Handle cross-processor monitoring cor- 
rectly with LSI technology, even though monitoring 
operates asynchronously between plural CPUs in 
an MP. 

5. Maintain hardware-software correlation by 
capturing the instruction address concurrent with 
other state data in each CPU. 

6. Capture virtual address-space identifier 
signals available as a condition in the hardware 
during program execution in a private virtual stor- 
age area while using virtual addressing. 

7. Maintain time-stamping for a collection of 
data, using the time of day (TOD) of the most 
recently executed Trace instruction in relation to 
data collections obtained by event sampling. The 



time-stamped trace instructions give a higher level 
of software resolution than the captured instruc- 
tionaddresses, by enabling a comment field in the 
trace table entries to resolve a program identifica- 

s tion that might otherwise be ambiguous. 

In patent 4,590,550, data collection is initially 
made In Instrumentation table units (ITU's) which 
are hardware arrays built into the system in areas 
local to signals of interest. The ITU's in a processor 

10 are also useable for collection of event-sampled 
signals of this invention, as well as the prior timer- 
sampled signals in 4,590,550, which described 
ITU's as either dedicated to instrumentation, or 
shared by instrumentation and diagnostic oper- 

75 ations. (Shared arrays have only one type of opera- 
tion at any given time.) 

This invention applies to ITU's located in any 
hardware functional element in a system, and com- 
ponent parts of the ITU may be distributed within 

zo the element in proximity to the sources of event 
signals that may be monitored. There may be more 
than one ITU in an element as an optional design 
choice, such as separate ITU's in a CPU's instruc- 
tion unit, execution unit, and cache control unit 

25 (BCE), for sampling locally derived event signals. 
While the invention may be used in all types of 
system elements, hereinafter the CPU element is 
used as an exampie. 

Event measurement sampling by this invention 

30 is not done in the manner that sampling was de- 
scribed in patent 4,590,550. which was done at 
regular time Intervals based on the user's selection 
of one from a number of possible timer-sampling 
rates, e.g. every millisecond. But this invention may 

as be used alternatively with the invention in 
4,590,550 by superimposing the structure for this 
invention on the structure found In 4,590,550. 

Also, the event-sampling used by this invention 
does not use the synchronizing property of the 

4o periodic sampling pulses described In 4.590,550, 
which were distributed to all ITU's in the system to 
synchronize the collection of signals. There, each 
ITU contained an Instrumentation Trace Array 
(ITA) with multiple entries for the recording of data 

46 signals. An initial reset was done on all ITU's to set 
them to the same address, namely zero; and there- 
after their addresses were incremented synchro- 
nously by the common sampling pulse. Equal entry 
positions (addresses) were therefore simultaneous- 

60 ly accessed in a corresponding entry in every ITU 
array, and the entry in every ITU was incremented 
in unison by the next periodic sampling pulse. The 
contents of corresponding ITU entries were the 
data presented at the time the same sampling 

55 pulse switched the "current ITA address" to its 
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next entry. Thus, the corresponding entries in all 
ITU's, i.e. those with the same address, had their 
data captured at the same time throughout the 
system to maintain synchronization. 

Also, in patent 4,590,550, ITU addressing could 
automatically wrap back to address zero after the 
last location was filled. Also, an output signal could 
be generated whenever the ITU address either 
passed its half-filled address or wrapped back. On 
the occurrence of either signal, the most recently 
filled half of ail ITU's was moved to output buffers 
for writing an I/O device. Output thus could al- 
ternate between the two halves of each ITU, under 
microcode control. 

The event-driven sampflng in this invention has 
the advantage over the timer-driven sampling in 
4,590,550 that if signals of interest occur infre- 
quently relative to the timer pulse, or occur at 
times other than the timer pulses, then the re- 
corded time-driven samples will not contain data 
useful for analysis of such events. For such events, 
the measurement run must go on for an exces- 
sively long time in order for a sufficient number of 
"good" samples to be collected. This can become 
an insuperable problem it 

1. The signal being studied is infrequent, 
such as cross-Interrogate hit. In this situation, al- 
most all timer-driven samples may be useless, 
since their "XI hit" state indicator may be off due 
to lack of any XI hit occurring during a timer-driven 
sample pulse. 

2. The system being measured (hardware 
and/or software) is at an operating level that may 
not remain stable, or even functional, for a long 
enough period to be time sampled. 

Event-driven sampling Is provided in this speci- 
fication as an alternative instrumentation mode for 
operation within the general ITU structure disclosed 
in U.S.A patent 4,590.550. Event-driven sampling 
provides a sampling pulse only when a selected 
event occurs, which may occur at irregular times, 
Instead of at the regular (periodic) occurrences of 
the timer-driven sampling pulses that may not oc- 
cur during the occurrence of the event of interest. 
Thus, event-driven sampling records no useless 
samples, and the measurement intervals may be 
variable. Even if the event sampOng rate is much 
slower than timer-driven sampling, the yield in use- 
ful samples will In general be significantly higher 
for infrequent asynchronous event occurrence. 

Event sampling in each CPU may be triggered 
by one type of event, or by plural types of events, 
either alone or in combination with one or more 
CPU conditions, in any logical combination of sig- 
nals which may be ANDed, ORed, and/or BUT- 
NOTed together to provide an event-recording sig- 
nal. 



Event sampling is described for CPU elements 
herein, in which the desired result is a correlation 
between program execution and behavior else- 
where in the CPU. Essential internal CPU signals 
5 for this correlation are derived within the instruction 
element (IE), execution element (EE), and buffer 
control element (BCE). 

Event sampOng measurements need only be 
local within each CPU in a multiprocessor. That is, 
10 event sampling (unlike timer sampling) does not 
require measurement recording synchronization for 
different CPUs. AH active CPU's in an MP may 
asynchronously record events affecting plural 
CPUs and nevertheless obtain any inter-CPU rela- 
ys tionship. That is, an event local to a single CPU, 
but affecting another CPU, nevertheless can be 
handled as a local event. For example, if a "cross- 
interrogate hit" is selected as an event for sam- 
pling measurement, its recorded sampling on CPU 
20 1 could include the CPU Identifier of the requesting 
CPU 2 and the time-of day. Then, related events 
can be determined from the data recorded from the 
ITUs of the two CPUs without any synchronization 
between the event samplings recorded for the dlf- 
25 ferent CPUs. Hence, the ITUs of the different CPUs 
can fill asynchronously at different rates deter- 
mined by the event frequencies in the respective 
CPUs. 

CPU signals or states used in event sampling 

30 may be classified into the categories of (1) event 
signals, and (2) condition signals, which differ in 
their duration. Event signals have only a short 
duration, e.g. a single machine cycle. Condition 
signals have a longer duration lasting many cycles 

36 and may exist when events happen. 

Event sampling may be made conditional on 
the current instruction address falling within a given 
range, (e.g. the PER registers in the S/370 im- 
plementation); or on a special latch having been set 

40 by special state instructions (e.g. diagnose or an 
emulation instruction) placed in the code to signal 
entry to and exit from programmed routines of 
interest Condition control allows another dimension 
of selectivity, in that event sampling can be turned 

45 on and off under control of one or more conditions. 
A special type of event sampling included in 
the subject invention Is the sampling of "successful 
branch" events, which differ from most other 
events that are primarily studied to determine the 

50 hardware characteristics of code execution (such 
as cache behavior). The recording of "branch-tak- 
en" data documents the primary paths of program 
control flow, and their sequence of events (rather 
than their aggregate frequency) is the objective. 

55 Branch sampling places some special constraints 
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on Implementation, so that branch sampling may 
be handled as a separate instrumentation mode. 
Conceptually, however, sampling on a branch ©vent 
is clearly an instance of event sampling. 

The recording of event sampling is controllably 
adjusted to every Nth event occurrence in which N 
is any integer, including one. For frequent events, 
N is greater than one for frequency reduction con- 
trol which is necessary to avoid filling the ITU array 
faster than its recorded content can be moved out 
to an output buffer, to prevent buffer overrun. 

The Invention avoids a problem occurring when 
CPU state data is sampled at fixed intervals (e.g. 
timer sampling to study the interaction between 
programs and computer structure, or other perfor- 
mance relationships). The problem with timer-sam- 
pling is that some events do not happen at the time 
of a time-sampling pulse. The required data occur- 
rences may be so rare that a large number of time 
samples must be taken over a very long period of 
time to get a statistically meaningful number of 
samples that include the event of Interest. Or, the 
event may occur so frequently that the required 
data rates are higher than feasible to record the 
number of samples taken. 

Event-driven sampling solves these data rate 
problems by only recording selected states of the 
machine for certain events, where event Is the 
occurrence of a specified hardware signal, and 
may indicate the fact that a branch instruction has 
been executed effecting a change In the Instruction 
stream. In other words, with event-driven sampling, 
data recording is not done at arbitrary timer inter- 
vals, but only whenever the event (specified state 
or true branch) has occurred. Multiple sub-ele- 
ments of a system may participate In an event 
sampling run, but an event causes a sample to be 
taken within only the prescribed CPU. 

Selected conditions within the processor may 
be used to determine the sampling of selected 
event(s), such as the condition of the current in- 
struction being within the range of the S/370 PER 
(program-event recording) registers, or the con- 
dition that a unique state has been set in the CPU 
through a state-controlling instruction (e.g. SIE or 
diagnose in S/370XA). 

As previously stated, event-sampling rates may 
be controlled by limiting the recording of event 
information to every Nth event. This is useful In 
cutting down the amount of data to be collected for 
cases where information integrity is not seriously 
disturbed by such data loss, such as the 
"instruction first cycle" (IFC), for example N might 
be 5 to 7 if the other sampled data of interest is 
happening frequently enough; and for branch 
events, N might be 2. 



The recording of samples based on time-sam- 
pling may gather each sample into a table, wrap- 
ping to the top of the table each time the table is 
filled. If the table is large enough, a mechanism 

5 can be used to record each table (or a part there- 
of) before un recorded samples are overlayed. 

An alternative collection approach is offered in 
this Invention, made practical by the fact that only 
samples of interest are being gathered by event- 

io sampling. This alternate approach stops recording 
in the table once the table (ITA) is filled, outputs 
the table entries, and restarts table recording from 
its beginning to fill the table again. This mechanism 
guarantees that all of L number of events will be 

75 saved, where L is the length of the table, no matter 
what the rate is for the event recording. 

These and other objects, features and advan- 
tages of the invention may be more fully under- 
stood and appreciated by considering thB following 

20 detailed description in association with the accom- 
panying drawings. 



Brief Description Of The Drawings 

FIGURE 1 is an overall block diagram of a 
data processing system containing the Invention. 

FIGURE 2 is a block diagram of an embodi- 
ment of the invention In any CPU in FIGURE 1 . 



Detailed Description Of The Preferred Embodiment 

FIGURE 1 shows a multiprocessor (MP). It 

35 provides an ITU (instrumentation table unit) inter- 
nally In each of its CPU's, and other elements 
(SCE (System Control Element). CCE (Channel 
Control Element), and PCE (Processor Control Ele- 
ment)). A command path 151 is shown linking the 

40 ITU's to the processor controller element (PCE), 
which is associated with the system operator con- 
sole from which control over the ITU subsystem is 
provided, and all ITU output buffers reside In the 
PCE. The output buffers are filled from the ITUs by 

46 data transfers on path 51. When filled, each output 
buffer is written to a disk output medium under 
control of a PCE ITU output program, which also 
controls the transfers on path 51 of ITU data into 
the buffers from the respective ITU arrays. 

so Another PCE function in support of instrumen- 
tation is to initialize and terminate measurement 
runs, based on user inputs. The command struc- 
ture and logic for starting and stopping a measure- 
ment run is like that found in the prior art and is not 

SB part of this invention. 
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FIGURE 1 includes the overall instrumentation 
structure by showing the preferred embodiment as 
an ESR (event sampling recorder) within each 
CPUs ITU, which also includes the timer-driven 
embodiment described in the patent 4,590.550 
specification which is incorporated by reference 
into this specification. Thus, each ESR shown in 
CPU1 and CPU2 obtains event-sampling for the 
respective CPU, as well as obtaining the regular 
time sampling previously disclosed and claimed in 
U.S.A. patent 4,590,550 using the timer-sampling 
pulses provided by a regular pulse generator 54 on 
distribution line 55 to the ITUs. The ITU's in the 
non-CPU elements, SCE and CCE, are not shown 
as using the invention, and they only use regular 
(periodic) time sampling. Hence, the structure in 
FIGURE 1 of the subject application includes the 
structure disclosed in FIGURES 1 through 8 in 
patent 4,590,550, on to which each CPU in the 
subject application has added the structure shown 
in FIGURE 2 of this application. 

In FIGURE 2 of this application, reference num- 
bers less than 100 refer to an item described in, 
and having the same reference number, in patent 
4,590,550; and reference numbers over 100 are for' 
new items in this application. Command path 151 
is inclusive of command path 51 in 4,590,550. 

The boxes shown in FIGURE 2 represent logic 
functions performed by circuits and microcode. 
These boxes preferably are not physical packaging 
entities. 

Measurement control is provided from a PCE 
console, where a user issues instrumentation com- 
mands and enters desired measurement character- 
istics, e.g. in an appropriate menu on a console 
display screen. The use of menus to select com- 
mands is well known in the computer arts. In FIG- 
URE 1, the command parameters are transmitted 
on path 151 to any selected ITU by a command, 
and that ITU receives and decodes the command 
in the PCE command decoder 34 and outputs 
command signals to boxes 102, 104, 107 and 108 
in which they set appropriate latches in accordance 
with the decoded command signals. The command 
operations relating to event sampling include the 
following: 

1. Selection of the sampling mode by setting 
the mode in instrumentation mode selection box 
100. Any of event sampling mode, time sampling 
mode, or branch sampling mode may be selected 
in box 100. CPU signals or states used in event 
sampling may be classified into the categories of 
(1) event signals, and (2) condition signals, which 
differ in their duration. Event signals have only a 
short duration, e.g. a single machine cycle. Con- 
dition signals have a longer duration lasting many 
cycles, and may exist when- events happen. 



2. Selection of N for controlling the recording 
of every "Nth occurrence "-of sampling is set in the 
Nth event control box 109. The selection of N is 
based on estimated CPU event frequency, so as to 

6 give a data collection rate that will fill the respec- 
tive ITA and out buffer with no (or minimum) over- 
run. If N is too high, the sampling rate may be too 
slow and fall below its optimum buffer operation; 
but if N is too low, the sampling rate may be 

io extremely frequent, and the ITA and/or its out buff- 
er may fill before It can be completely readout, 
resulting in data loss from overruns and a subop- 
timal effective output rate. 

3. Selection of most of the machine signals 
is from fines 31 A to Z, is set into the event signal 

collector box 104. The event signals are chosen 
from the screen menu list provided by a command 
and they are entered into selection triggers (not 
shown) in box 104. The Individual events each 

20 have a latch in box 104, in which the signal is 
momentarily collected as it happens and it may be 
stored for a few machine cycles. For example, the 
Instruction counter settings will be provided to box 
104, as will each operand address, etc. (Some of 

26 lines 31 A to Z are connected only to gates 31 , 
such as for example, the lines providing signals for: 
operation codes, address space identifiers, the 
time-of-day, and storage protect keys, which how- 
ever, can be combinatorialy selected by event se- 

30 lection box 107 in its output path 108A to gates 
111.) 

4. Selection of conditions to be monitored is 
set into condition selection logic box 102. Con- 
ditions last for a substantial duration while events 

35 last for a relatively short duration. Examples of 
selectable conditions are as follows: 

Special instruction states lasting over multiple 
instructions (e.g. Start Interpretive Execution (SIE) 
40 instruction states); 

Model dependent instruction controlling an instru- 
mentation control and conditioning latch 
(ICCLATCH).e.g. Diagnose instruction. 

45 

Instruction address within range (IAWR). 

A selected condition signal(s) may be sent to 
box 107 where the condition may be combined 
with an event signal. 

so 5. Selection of a particular type of event, or 

combination of event types (that are to be mon- 
itored for a measurement) are set in event selec- 
tion logic box 107. Various conditions can be im- 
posed in the command to any measurement selec- 

55 tion. Examples of collectable CPU signals selec- 
table In box 107 are as follows: 
Instruction first cycle (IFC); 
Machine cycle; 
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Address compare; 
Microflag event; 
Cache hit; 
Cache miss; 

Caused XI hit/castout; and 
DLAT miss. 

Other examples in box 107 are combinations of 
a collected event combined with a selected con- 
dition from box 102, and are as follows: 

Cache miss on instruction fetch (IFET); 
IFC and (IAWR or ICCLATCH); 
Microflag event and (IAWR or ICCLATCH); 
Cache miss and (IAWR or ICCLATCH); 
Caused XI hit/castout and (IAWR or IC- 
CLATCH); 

DLAT miss and (IAWR or ICCLATCH); and 
IFET cache miss and (IAWR or ICCLATCH). 
6. Select an "overrun threshold number" 
which is compared to the number of times the ITA 
overruns during a measurement run, (e.g. related to 
the number of samples lost due to the ITA being 
filled before it can be read out). If the threshold 
value is exceeded, the measurement run is auto- 
matically terminated. This function may be done by 
the PCE. (The overrun threshold number may be 
retained In the PCE program controlling the trans- 
fers on path 51 between each ITA and Its output 
buffer.) 

The user also may make measurement command 
selections not solely related to event sampling, 
such as how to start and stop the instrumentation 
run, which are retained in a program in the PCE. 

In more detail, the measurement control command 
signals are sent on bus 151 in FIGURE 1 from the 
PCE to each ITU. FIGURE 2 shows the command 
bus 151 received by the PCE command decoder 
34 which generates and outputs control signals on 
a bus 101 , which sets up the controls in the ITU, as 
previously described. The selection settings may 
be summarized as follows: 

A. The instrumentation mode Is set in box 
100 to event sampling mode, branch event sam- 
pling mode, or time sampling mode. (In event 
sampling mode, or branch sampling mode, time- 
sampling input pulses received on line 55 are in- 
hibited from reaching any CPU ITU.) 

B. The event selection box 107 is to select 
an event, or combination of event(s) or condition(s). 

C. The condition selection box 102 may 
be set to select the conditions) that will be active 
during the measurement run. 

D. The Nth event control box 109 is set to 
the value N. This sets a counter in box T09 to the 
value N to output a gating signal to gates 31 on 
each Nth occurrence of the selected event, or the 



selected combination occurring with any selected 
condition, so that only each Nth occurrence of the 
selected event/condition wlil be recorded in ITA 32. 
E. The event signal collector box 104 is 
5 set for selecting which of the event signals will be 
latched for enabling their data collection in the ITA. 
(This primes a path for selected data to pass from 
the CPU into the ITU array (31 A - Z) upon occur- 
rence of a sampling trigger from box 109.) 

io F. The overrun threshold is set in the PCE 

for termination of the measurement If successive 
overruns occur. 

At some subsequent time the actual measure- 
ment begins at a time set by an operator com- 

15 mand. This means that selected machine state 
signals on fines 31 A to Z will now be gated for 
recording in the ITA at an ITA address selected by 
ITA address generator 33 on the occurrence of 
each output signal from box 109. 

20 When any signal occurs on lines 31 A to Z that 
is to be selected for a sampling operation, the 
signal is sent to the ITU's event-signal collector 
104 from the CPU source where it occurs. The 
collector 104 latches each selected event signal 

25 and forwards It to other areas of the ITU: 

1. To event selection logic box 107 to deter- 
mine whether this signal is to be used for sam- 
pling. 

2. To the ITA storage gates 31 on bus 111 
30 where the selected signal is gated to the ITA where 

they are recorded in the current ITA entry as the 
data being collected. 

The event signal bus 103 passes the condition 
signals, and bus 105 passes some of the event 

35 signals to condition selection box 102, where con- 
ditionally specifications were set under command 
control. In box 102, the signals are tested, and If 
selected they are latched for the measurement run. 
The latch is outputted on line 108 to box 107 if the 

40 selected condition was signalled by the CPU. 

In event selection box 107, selected signals are 
checked for a match against the events previously 
specified from command path 101 during Instru- 
mentation initialization. Only when a selected signal 

45 match is found In box 107, is any output pulse 
forwarded on path 108 to the Nth event control box 
109. 

In Nth event controls 109, the counter (CTR) is 
incremented by each occurrence of the event(s)- 

5o /condHions(s) selected for measurement. Each time 
the incremented counter reaches value N, a sam- 
pling signal is outputted on path 110 to gates 31 to 
enable the recording in the ITA of the selected set 
of signals in the latched set provided on bus 111 

55 from collector 104, and then the counter is reset to 
zero. 
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Also, the signal on path 110 is provided to the 
ITA address generator 33 where it causes a table 
address counter (TAC) to be incremented. (TAC is 
a six-bit counter for an ITA data array having 64 
entries. The ITU array must have sufficient holding 
capacity to avoid overruns under normal measure- 
ment circumstances.) 

This address incrementing logic may be the 
same for time, branch and event sampling modes. 
However, as mentioned above, event sampling 
mode (unlike time sampling mode) is inherently 
asynchronous between CPU's, so that output con- 
trol for the ITA is different for event and branch 
mode sampling than for time mode sampling. 

As noted, event sampling may be made con- 
ditional on the current instruction address falling 
within a given range, IAWR (e.g. the PER registers 
in the S/370 implementation); or on a special latch 
(ICCLATCH) in box 102 (but not shown) having 
been set by special state instructions (e.g. diag- 
nose or SIE) placed in the code being measured in 
order to signal entry to and exit from routines of 
interest. Condition control allows another dimension 
of selectivity, in that sampling can be restricted by 
address range or dynamically turned on and off 
under program control for certain CPU states or 
CPU model dependencies. 

The branch mode uses branch-taken events 
which differ from other events and conditions, in 
that the purpose of recording branches-taken is not 
primarily to study the hardware characteristics of 
code execution (such as for cache behavior), but to 
document the primary paths of program control 
flow. In such case, the sequence of branch-taken 
events, rather than their aggregate frequency, is 
the objective; and this places some special con- 
straints on implementation, which herein treats 
branch sampling as a separate instruction mode. 
Conceptually, however, sampling on a branch event 
is clearly an instance of event sampling. 

The recording of event sampling occurs on the 
setting of N. For frequent events, recording only on 
the Nth event occurrence is necessary to avoid 
filling the ITA faster than its recorded content can 
be moved into its output buffer to prevent buffer 
overrun. 

When CPU state data is sampled at fixed inter- 
vals (i.e. time sampling) and recorded for later 
analysis (to study the Interaction between programs 
and computer structure, and other performance re- 
lationships), a problem may exist that some events 
happen at rates that are difficult to sample, either 
because they are so frequent that the demanded 
recording data rates are higher than feasible to 
record the number of samples taken, or because 
they happen so seldom and over a very short 



duration that a large number of samples must be 
taken over a very long period of time to get a 
statistically meaningful number of samples that in- 
clude the event of interest. 

5 Hence, event driven sampling captures the 

state of the machine at certain specified events, 
where the events are the occurrence of a specific 
state, or the fact that a branch instruction has been 
executed effecting a change in the instruction 

w stream. In other words, in event sampling mode, 
sampling is not done at arbitrary timer intervals, but 
only whenever an event (specific state or true 
branch) has occurred. Multiple sub-elements of the 
processor may participate in an event sampling 

15 run. 

Thus, selected conditions within the processor 
may be used to condition the sampling of selected 
event(s), such as the condition of the current in- 
struction being within the range of the S/370 PER 

zo (program-event recording) registers (IAWR), or the 
condition that a unique latch has been set through 
a state-controlling instruction (e.g. SIE or diagnose 
in S/370XA). 

The output buffer control for event sampling is 

25 as follows: 

When the ITA is full (i.e. indicated by incre- 
menting the TAC to its highest count), inputting 
from gates 31 Into ITA 32 is inhibited, and a signal 
is provided on line 33B to cause the outputting of 

30 the ITA to the associated out buffer in the PCE 
storage. Then TAC is reset to the first ITA address 
on its address bus, and the ITA is again filled, eta 

A second table address counter (not shown) 
may be provided in box 33 to provide the input 

36 address to the ITA, so that inputting via gates 31 
may continue into half of the ITA while its other half 
is outputted under control of addresses generated 
by TAC. Thus by outputting half of the ITA at a 
time, the other half is concurrently available to 

40 receive event samples. The outputting of the first 
and second halves of ITA is controlled by a first- 
half full signal on line 33A and a second-half full 
signal on line 33B. (Alternatively, when only a sin- 
gle TAC is used for both input and output of the 

45 ITA, inputting to the ITA Is locked out, i.e. inhibited, 
during its outputting to avoid potential interfer- 
ence.) 

When the Nth signal on bus 110 reaches the 
ITA control gates 31 , that enables the storing of an 
so event sample in the ITA 32 at the current address 
indicated by TAC. 

The measurement operations for an ITU during 
event sampling are eventually terminated according 
to the commands which specified the measure- 
55 ment 
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The data flow in the CPU Involves its various 
sub-elements that forward instrumentation data 
through lines 31 A to31 Z as the machine signal 
interface to the ITU. These data signal inputs are 
the same for event sampling mode as they are for 
time sampling mode, except that for event sam- 
pling the event-related signals are forwarded via 
paths 103, 105 and 111 A for the processing of 
selected event(s)/condition(s) Into recording sam- 
ples. Storing in an ITA entry can occur only at the 
instant when a signal is provided on bus 110, so 
that each sample Inputted into the ITA 32 cor- 
responds to a single occurrence of the selected 
event(s)/condition(s) upon each Nth occurrence. 

If address generator uses only one TAC coun- 
ter for both ITA input addressing and output ad- 
dressing, the CPU signals received by the ITA 
gates 31 are stored in the ITA only if: 1) A trigger 
signal on bus 110 is provided from Nth event 
control 109, and 2) the ITA input Is not locked while 
the TAC in the address generator 33 outputs the 
filled ITA content to Its out buffer. 

A requirement for the implementation of event 
sampling is that the data for recording a sample 
Inputted to ITA 32 must convey the machine state 
existing at the time the pertinent machine signals 
were generated in the CPU. In other words, the 
machine state data recorded in the ITA must be 
reasonably contemporary with the signals repre- 
senting the occurrence of the event. This require- 
ment might fall to be met if substantial delay were 
to occur In the signal processing In boxes 104, 
102, 107, 109 and 31, or through some kind of 
lookahead that tries to anticipate a future machine 
state. 

In CPUs comprised of very high speed logic 
circuits, it may not be possible in a single machine 
cycle to operate control logic 102, 104, 107, 109 
and gates 31, and still be able to record the se- 
lected data in ITA 32 in the same cycle, or even in 
the next cycle. In such case, a greater delay may 
become necessary for sampling the machine states 
in the ITA. for example three cycles after the 
occurrence of a selected event signal. In such 
case, the data path to the ITA may not be precisely 
timed with the CPU generation of the event signals. 

If such substantial delay exists, the recorded 
sample data will to some extent not fully represent 
the true event environment. The degree of repre- 
sentation loss will depend on the amount of delay. 

The PCE activity during system measurement 
controls the operation of the associated out buffer. 
The PCE monitors each out buffer and causes it to 
be written to disk when full. The PCE also logs a 
count of overruns of each ITA. Overruns indicate 
data loss; and depending on their frequency, over- 
runs may affect the measurement accuracy of a 



run. If the user has specified an overrun threshold 
as a particular number of overruns, a measurement 
run may be terminated if the overrun threshold is 
reached. 

5 

Claims 

1. Internal processor instrumentation monitoring 
to means for obtaining event controlled measurement 
data on the software/hardware operation of a data 
processing system, including at least one CPU and 
I/O control, the monitoring means comprising: 

75 at least one instrumentation table unit (ITU), each 
ITU being embedded internally in local proximity to 
signals which may be monitored; 

the ITU having an instrumentation table array (ITA) 
20 that includes a plurality of entries, each entry being 
capable of storing an input signal to the ITA while 
the entry is being addressed; 

gating means for connecting a selected set of 
25 intemai signal lines as input signals to the ITA; 

addressing means for selecting a current entry in 
the ITA and for enabling the current ITA entry to 
receive and record the received state of the input 
so signals to the ITA; 

means for selecting and detecting event-related 
signals and generating an event-sampling signal for 
monitoring; 

35 

event control means for receiving the event-sam- 
pling signal from the selecting and detecting 
means to signal the gating means and the address- 
ing means that selected event-related signals are 
40 to be recorded in the current ITA entry after which 
the addressing means is to address the next entry 
In the ITA as the current entry; 

a set of recorded ITA signal states in an iTA entry 
45 being determined by the occurrences of the event- 
sampling signal; 

a collection of event-related signals being recorded 
in entries of the ITA over an interval of time, or 
so over a predetermined number of event occur- 
rences, to provide an instrumentation measure- 
ment; 

means for connecting the ITU to output recording 
55 means for storing the recorded ITA entries to col- 
lect a statistically significant number of ITA entries 
during a period of ITU measurement activity. 
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2. Internal processor event-controlled Instru- 
mentation monitoring means as defined in Claim 1, 
comprising: 

condition selection means for being set by a con- 
dition selected for limiting the recording of instru- 
mentation information in the ITA to events occur- 
ring during the existence of the selected condition. 

3. Internal processor event-controlled instru- 
mentation monitoring means as defined in Claim 1 
or 2, the monitoring means comprising: 

counting means operating with a selectable modulo 
N for counting the number of event-sampling sig- 
nals received from the event control meahs, the 
event control means outputting a signal on each 
modulo N count to the gating means and the 
addressing means for enabling the recording of 
signals in the ITA only upon the Nth occurrence of 
the event-sampling signals. 

4. Internal processor event-controlled instru- 
mentation monitoring means as defined In Claim 1 
or 2, comprising: 

instrumentation mode selecting means for selecting 
an event-driven sampling mode, or a timer-driven 
sampling mode. 

5. Internal processor event-controlled instru- 
mentation monitoring means as defined in one of 
Claims 1 to 4, the instrumentation mode selecting 
means comprising: 

a branch-taken mode selecting means for the spe- 
cial case of measuring branch-taken events. 

6. Internal processor event-controlled instru- 
mentation monitoring means as defined in one of 
Claims 1 to 5, the condition-selection means com- 
prising: 

latch' means for being set on when a predeter- 
mined instruction is executed; 

means for setting off the latch means by either an 
occurrence of an event or the execution of a pre- 
determined Instruction; 

an output of the latch means providing an output 
for the condition-selection means for indicating a 
processor condition that exists during the execution 
of one or more subsequent instructions. 

7. Internal processor event-controlled instru- 
mentation monitoring means as defined in one of 
Claims 1 to 6, the condition-selection means com- 
prising: 

a model-dependent instruction being the predeter- 
mined Instruction which sets on the latch means. 



8. Internal processor event-controlled instru- 
mentation monitoring means as defined in one of 
Claims 1 to 7, the condition-selection means com- 
prising: 

5 

an emulation type of instruction being the predeter- 
mined instruction which sets on the latch means, 

9. Internal processor event-controlled instru- 
mentation monitoring means as defined in one of 

10 Claims 1 to 8, the condition-selection means com- 
prising: 

means for sensing when the instruction address 
register contains an address within a selected 
is range of addresses. 

10. Internal processor event-controlled instru- 
mentation monitoring means as defined in one of 
Claims 1 to 9, the monitoring means comprising: 

20 event signal collector means connected to the In- 
ternal signal lines for providing event-related sig- 
nals to the ITU; 

temporary storing means In the collector for select- 
25 ing a set of the input event-related signals for an 
instrumentation measurement. 

11. Internal processor event-controlled instru- 
mentation monitoring means as defined in one of 
Claims 1 to 10, the monitoring means comprising: 

30 

event selection logic means receiving the selected 
set of input event-related signals from the collector 
means and selecting one or a combination of the 
signals for generating the event-sampling signal. 
35 12. Internal processor event-controlled instru- 
mentation monitoring means as defined in one of 
Claims 1 to 11, comprising: 

a command controller means connected between 
40 the ITU and system console hardware for sending 
measurement selection command signals to the 
ITU to set ITU selections of Internal signals and 
conditions for an event-controlled instrumentation 
measurement of activity in the processor. 
45 13. Internal processor event-controlled instru- 
mentation monitoring means as defined in one of 
Claims 1 or 12, comprising: 

an output buffer for receiving the content of the 
so ITA; 

means for initiating the transfer of the content of 
the ITA to the output buffer; 

55 means for inhibiting the input to the ITA from when 
the ITA is full until the content of the ITA has been 
transferred to the output buffer. 
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^recorded in the ITU. The ITUs of plural processors 
Ware asynchronously operated in a system. The 
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